[feat] Adding support for PostgreSQL storage via SQLAlchemy #3567

mredenti · 2025-10-06T20:22:13Z

This PR introduces SQLAlchemy Core to abstract database interactions in Reframe's storage backend, adding support for PostgreSQL alongside SQLite. It serves as a proof-of-concept to gather feedback on this approach before adding comprehensive tests and documentation. I have manually tested PostgreSQL with psycopg2 for schema creation, storage, and CLI output. Full PostgreSQL tests in test_storage.py will follow approval.

Motivation

Supporting multiple SQL dialects (SQLite, PostgreSQL, MySQL) is complex due to differences in:

Connection APIs (e.g., PostgreSQL needs drivers like psycopg2 or pg8000)
Schema syntax (e.g., SQLite's AUTOINCREMENT vs. PostgreSQL's SERIAL)
Transaction behaviors and foreign key handling
Risk of UUID collisions in concurrent sessions with application-generated IDs

SQLAlchemy Core unifies these interactions, improving maintainability and extensibility.

Next Steps (pending feedback)

Add PostgreSQL tests to test_storage.py
Update documentation for PostgreSQL configuration
Implement database-generated IDs for session UUIDs
Support report caching for retry on storage failures

Note Users can replace psycopg2 with their preferred PostgreSQL driver (e.g., pg8000) by updating requirements.txt and postgresql_driver in the system configuration. Suggestions on test cases or PostgreSQL edge cases are welcome.

…e-only behaviour and schema layout. - Introduce _ConnectionStrategy class and _SqliteConnector sub-classe to encapsulate URL building, engine creation, and per-dialect kwargs. - Build schema with SQLAlchemy Core (MetaData, Table, Column), preserving table/column names and the index_testcases_time index. - Convert INSERT, SELECT, DELETE queries to Core expressions. - Enable SQLite foreign keys via PRAGMA foreign_keys=ON on each connection (matches previous cascade semantics). - Replace raw sqlite3 connection/locking with SQLAlchemy transactional contexts (engine.begin()) - Drop legacy file lock code and direct sqlite3.connect() calls. - Remove uuids_sql raw string building in remove paths; use SQLAlchemy's in_() instead.

- Introduce _PostgresConnector and wire backend='postgresql' in StorageBackend.create(). - Read connection parameters from site config: storage/0/postgresql_driver, postgresql_host, postgresql_port, postgresql_db, postgresql_conn_timeout and credentials from env: RFM_POSTGRES_USER / RFM_POSTGRES_PASSWORD. - Pass connect_timeout via DBAPI connect args. - Extend config schema (schemas/config.json) with the new Postgres options and add sensible defaults (driver=psycopg2, conn_timeout=60). - Keep SQLite as-is; no behavior change for existing users. - Add psycopg2-binary==2.9.8 to requirements.txt to provide the PG driver.

- Add ConnectionStrategy.json_column_type property (default Text, overridden to JSONB for Postgres). - Use this type in schema so Postgres stores sessions.json_blob as JSONB. - Normalize JSONB values in _decode_sessions() by serializing non-strings to JSON strings for consistent downstream parsing. - CLI: display backend info depending on storage backend (SQLite file path vs PostgreSQL host:port/db). - write report as encoded JSON if postgresql

codecov · 2025-10-06T20:27:41Z

Codecov Report

❌ Patch coverage is 78.51240% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.04%. Comparing base (ce5a2e3) to head (c11bf3f).
⚠️ Report is 5 commits behind head on develop.

Files with missing lines	Patch %	Lines
reframe/frontend/reporting/storage.py	81.25%	21 Missing ⚠️
reframe/frontend/cli.py	44.44%	5 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (ce5a2e3) and HEAD (c11bf3f). Click for more details.

HEAD has 13 uploads less than BASE

Flag BASE (ce5a2e3) HEAD (c11bf3f)

23 10

Additional details and impacted files

@@             Coverage Diff              @@
##           develop    #3567       +/-   ##
============================================
- Coverage    91.22%   69.04%   -22.18%     
============================================
  Files           62       60        -2     
  Lines        13525    13483       -42     
============================================
- Hits         12338     9310     -3028     
- Misses        1187     4173     +2986

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

vkarak

I had a quick look over the key idea and I generally agree with the abstraction. I do still have some questions though that I'd like to be clarified and tested before moving forward, because we want to ensure the current functionality and performance is not compromised (check my comments in the code).

Also, how easy would it be with this design to add another backend, such as DuckDB?

vkarak · 2025-10-11T20:35:07Z

reframe/frontend/reporting/storage.py

-        # Update DB file mode
-        os.chmod(self.__db_file, self.__db_file_mode)
-
-    def _db_create_indexes(self):


Where are the indexes created in the new design? These are crucial for the performance of time-based queries.

During the _db_schema() call, the full database schema (tables, columns, foreign keys and indexes) is declared:

def _db_schema(self): self.__sessions_table = Table( "sessions", self.__metadata, Column("uuid", Text, primary_key=True), Column("session_start_unix", Float), ..., Index("index_sessions_time", "session_start_unix"), ) self.__testcases_table = Table( "testcases", self.__metadata, Column("job_completion_time_unix", Float), ..., Index("index_testcases_time", "job_completion_time_unix"), )

The schema definitions are the translated into SQL statements and executed against the database when _db_create() calls:

self.__metadata.create_all(self.__connector.engine)

The create_all() is idempotent and creates any missing tables/columns/indexes without altering existing ones.

As a quick check I have inspected the SQLite database generated when running

./test_reframe.py -- unittests/test_reporting.py

and this shows the indexes are there

sqlite3 /tmp/pytest-of-mredenti/pytest-0/test_storage_api0/.reframe/reports/results.db SQLite version 3.45.1 2024-01-30 16:01:20 Enter ".help" for usage hints. sqlite> .schema CREATE TABLE sessions ( uuid TEXT NOT NULL, session_start_unix FLOAT, session_end_unix FLOAT, json_blob TEXT, report_file TEXT, PRIMARY KEY (uuid) ); CREATE INDEX index_sessions_time ON sessions (session_start_unix); CREATE TABLE metadata ( schema_version TEXT ); CREATE TABLE testcases ( name TEXT, system TEXT, partition TEXT, environ TEXT, job_completion_time_unix FLOAT, session_uuid TEXT, uuid TEXT, FOREIGN KEY(session_uuid) REFERENCES sessions (uuid) ON DELETE CASCADE ); CREATE INDEX index_testcases_time ON testcases (job_completion_time_unix);

Off course, perhaps it is worth to add a test that queries the database and indeed verifies the schema matches the intended one, including verifying the correct indexes have been created.

vkarak · 2025-10-11T20:39:57Z

reframe/frontend/reporting/storage.py

-    @contextlib.contextmanager
-    def _db_read(self, *args, **kwargs):
-        with self.__db_lock.read_lock():
-            with self._db_connect(*args, **kwargs) as conn:
-                yield conn
-
-    @contextlib.contextmanager
-    def _db_write(self, *args, **kwargs):
-        with self.__db_lock.write_lock():
-            with self._db_connect(*args, **kwargs) as conn:
-                yield conn


How can we guarantee a similar locking mechanism in the new design? These are needed to avoid "database locked errors" when having multiple ReFrame instances updating the DB using the SQLite backend. And, generally, the redesign should allow to extend the underlying backend capabilities with extra functionalities, like here with explicit read/write locking.

vkarak · 2025-10-11T21:00:01Z

Also, regarding this:

Implement database-generated IDs for session UUIDs

I don't believe this is necessary for two reasons:

The possibility of collision of UUIDs v4, such as those that we generate, is extremely low.
The DB feature is optional, but UUIDs are still needed and generated in the report. We want also to be able to generate a DB from existing JSON reports.

mredenti · 2025-10-12T14:21:54Z

Also, how easy would it be with this design to add another backend, such as DuckDB?

With this refactor, introducing a new SQL-based backend (e.g. DuckDB) requires implementing a new connector subclass:

class _DuckConnector(_ConnectionStrategy):
    def __init__(self):
        ...

    def _build_connection_url(self):
        return URL.create(
            drivername="duckdb",
            database="/path/to/results.duckdb"
        ).render_as_string()

The idea is indeed to be able to add a new backend in an isolated manner, independently of the storage and SQL logic. So, the _ConnectionStrategy base class defines the abstract interface for all SQL dialects and each connector only needs to provide the SQLAlchemy connection URL, and optionally _connection_kwargs (e.g. connection timeout, driver specific options) as well as a json_column_type property to declare the json blob type for that dialect (for instance PostgreSQL supports TEXT, JSON, JSONB)

Now, once a new dialect has been introduced, the only additional change required in storage.py is in the factory method of StorageBackend.create() to handle:

elif backend == 'duckdb':
    return _SqlStorage(_DuckConnector(), *args, **kwargs)

Commit e162185 shows this for PostgreSQL.

Always serialize the report object to JSON text using `jsonext.dumps()` before inserting into the database. This prevents `TypeError: Object of type RunReport is not JSON serializable` errors when using the PostgreSQL backend (JSONB column). For SQLite, we continue storing the JSON string.

mredenti added 4 commits October 6, 2025 10:44

feat(storage): add SQLAlchemy dependency

98fc604

mredenti changed the title ~~Feat/storage sqlalchemy~~ [feat] Addind support for PostgreSQL storage via SQLAlchemy Oct 7, 2025

mredenti changed the title ~~[feat] Addind support for PostgreSQL storage via SQLAlchemy~~ [feat] Adding support for PostgreSQL storage via SQLAlchemy Oct 7, 2025

vkarak self-requested a review October 7, 2025 22:16

vkarak assigned mredenti Oct 7, 2025

vkarak added request for enhancement reporting Issues related to reporting and processing the test results prio: normal labels Oct 7, 2025

vkarak added this to ReFrame Backlog Oct 7, 2025

vkarak moved this to Todo in ReFrame Backlog Oct 7, 2025

vkarak reviewed Oct 11, 2025

View reviewed changes

vkarak marked this pull request as draft October 11, 2025 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] Adding support for PostgreSQL storage via SQLAlchemy #3567

[feat] Adding support for PostgreSQL storage via SQLAlchemy #3567

Uh oh!

mredenti commented Oct 6, 2025

Uh oh!

codecov bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

vkarak left a comment

Uh oh!

vkarak Oct 11, 2025

Uh oh!

mredenti Oct 12, 2025

Uh oh!

vkarak Oct 11, 2025

Uh oh!

vkarak commented Oct 11, 2025

Uh oh!

mredenti commented Oct 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[feat] Adding support for PostgreSQL storage via SQLAlchemy #3567

Are you sure you want to change the base?

[feat] Adding support for PostgreSQL storage via SQLAlchemy #3567

Uh oh!

Conversation

mredenti commented Oct 6, 2025

Motivation

Next Steps (pending feedback)

Uh oh!

codecov bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vkarak left a comment

Choose a reason for hiding this comment

Uh oh!

vkarak Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

mredenti Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

vkarak Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

vkarak commented Oct 11, 2025

Uh oh!

mredenti commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Oct 6, 2025 •

edited

Loading

mredenti commented Oct 12, 2025 •

edited

Loading